Refining Inductive Bias in Unsupervised Learning via Constraints
نویسنده
چکیده
Algorithmic bias is necessary for learning because it allows a learner to generalize rationally. A bias is composed of all assumptions the learner makes outside of the given data set. There exist some approaches to automatically selecting the best algorithm (and therefore bias) for a problem or automatically shifting bias as learning proceeds. In general, these methods are concerned with supervised learning tasks. However, reducing reliance on supervisory tags or annotations enables the application of learning techniques to many real-world data sets for which no such information exists. We therefore propose the investigation of methods for refining the bias in unsupervised learning algorithms, with the goal of increasing accuracy and improving efficiency. In particular, we will investigate the incorporation of background knowledge in the form of constraints that allow an unsupervised algorithm to automatically avoid unpromising areas of the hypothesis space. Background Knowledge as Constraints. There is a natural connection between the bias in an algorithm and background knowledge. Often, the bias hardcoded into an algorithm was chosen due to background knowledge about the class of tasks to be targeted. This bias encodes certain assumptions about what sort of hypotheses are valid solutions for any problem it is applied to. However, for a specific task it is often the case that more precise information is available that can be used to augment the bias in useful ways. In such cases, it is desirable to leverage this background knowledge to refine the algorithmic bias in the proper direction. In particular, we are interested in improvements that can be obtained with the addition of problem-specific constraints. Constraints are derived from background knowledge and specify relationships between instances that may not be expressible in the traditional feature-value representation used for machine learning data sets. Current and Proposed Work. To date, we have investigated the incorporation of instance-level hard constraints into one clustering algorithm (a partitioning variation of COBWEB (Fisher 1987)). We found that incorporating constraints results in improved clustering accuracy (Wagstaff & Cardie in press). The types of constraints investigated were specific to algorithms that create flat partitions of the input data. We plan to investigate the relative merits of dif-
منابع مشابه
Controlling Complexity in Part-of-Speech Induction
We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via parametric and non-para...
متن کاملExploiting Inductive Bias Shift in Knowledge Acquisition from Ill-Structured Domains
Machine Learning (ML) methods are very powerful tools to automate the knowledge acquisition (KA) task. Particularly, in ill-structured domains where there is no clear idea about which concepts exist, inductive unsupervised learning systems appear to be a promising approach to help experts in the early stages of the acquisition process. In this paper we examine the concept of inductive bias, whi...
متن کاملA Survey of Inductive Biases for Factorial Representation-Learning
With the resurgence of interest in neural networks, representation learning has re-emerged as a central focus in artificial intelligence. Representation learning refers to the discovery of useful encodings of data that make domain-relevant information explicit. Factorial representations identify underlying independent causal factors of variation in data. A factorial representation is compact an...
متن کاملInductive Hypothesis Validation and Bias Selection in Unsupervised Learning
This paper approaches the importance of bias selection in the context of validating Knowledge Bases (KB) obtained by inductive learning systems. We propose a framework for automatic validation of induced KBs based on the capability of shifting the bias in the inductive learning system. We claim that this framework is useful not only when the system has to validate its own results, but also when...
متن کاملOn Analytical and Similarity-Based Classification
This paper is concerned with knowledge representation issues in machine learning. In particular, it presents a representation language that supports a hybrid analytical and similarity-based classification scheme. Analytical classification is produced using a KL-ONE-like term-subsumption strategy, while similarity-based classification is driven by generalizations induced from a training set by a...
متن کامل